-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert apache2.access to ECS #8901
Conversation
@@ -4,8 +4,8 @@ | |||
"grok": { | |||
"field": "message", | |||
"patterns":[ | |||
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?", | |||
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -" | |||
"%{IPORHOST:source.hostname} - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:apache2.access.agent}\")?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webmat The first IPORHOST
is interesting. Looking at the grok pattern it can be an IP or HOSTNAME (the grok definition of HOSTNAME, no port): https://github.com/elastic/elasticsearch/blob/6.4/libs/grok/src/main/resources/patterns/grok-patterns#L31 So far our test logs seem to contain always IP address here. Could this be different?
If it is a hostname, we would also have an issue further down with running geoip on it I assume. This means probably the pattern here should be modified to just be IP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%h: Remote hostname. Will log the IP address if HostnameLookups is set to Off, which is the default. If it logs the hostname for only a few hosts, you probably have access control directives mentioning them by name. See the Require host documentation.
You're right that this can be a hostname. Although I would err on the side of not modifying how the module previously worked, otherwise this is no longer an ECS conversion of a module, it becomes an overhaul of the module. And we're talking about doing them all in the next few days...
So in this case, the module probably has problems when people have hostnames there. I would argue this ECS conversion should is unrelated to fixing this problem.
I have no problem if you open issues for problems with this module, though :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just got into a rabbit hole based on this.
So yes, source can be a host (prior to determining if you have an IP literal or a hostname) there. I also saw that apache2.access.remote_ip
was previously stored as keyword
, which would have supported this. So unless we decide to change this field name (elastic/ecs#166), source.hostname
is indeed where you should store this.
Now if this module was doing GeoIP on this field, I guess I would go the extra step and conditionally duplicate this IP to source.ip
(after confirming this is an IP), and doing GeoIP on that field instead. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apache2.access.agent
: why not store directly in user_agent.original
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about apache2.access.agent
, changed.
"fileset.module": "apache2", | ||
"fileset.name": "access", | ||
"http.request.method": "GET", | ||
"http.response.body_sent.bytes": "209", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@webmat I suggest to put this under http here even though it's not in ECS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- It's true that people should be able to nest things freely under ECS fields. However, us doing so in these first conversions may confuse people, no?
- Plus, we'd be introducing a breaking change for a field that will change again in the future, once we settle on a final ECS name.
- The way this field is named is not an easy sell like "let's just add it as is to ECS", for me. Because with web servers you can track at least 6 different sizes: request header, body or total size. Same thing for response. So here I would stick with the previous field name under
apache.access
until we've added this to ECS.
- The way this field is named is not an easy sell like "let's just add it as is to ECS", for me. Because with web servers you can track at least 6 different sizes: request header, body or total size. Same thing for response. So here I would stick with the previous field name under
@webmat This is not complete yet but worth starting a dicussion on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great exercise. There's a lot of things to discuss here :-)
"fileset.module": "apache2", | ||
"fileset.name": "access", | ||
"http.request.method": "GET", | ||
"http.response.body_sent.bytes": "209", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things:
- It's true that people should be able to nest things freely under ECS fields. However, us doing so in these first conversions may confuse people, no?
- Plus, we'd be introducing a breaking change for a field that will change again in the future, once we settle on a final ECS name.
- The way this field is named is not an easy sell like "let's just add it as is to ECS", for me. Because with web servers you can track at least 6 different sizes: request header, body or total size. Same thing for response. So here I would stick with the previous field name under
apache.access
until we've added this to ECS.
- The way this field is named is not an easy sell like "let's just add it as is to ECS", for me. Because with web servers you can track at least 6 different sizes: request header, body or total size. Same thing for response. So here I would stick with the previous field name under
"offset": 73, | ||
"prospector.type": "log" | ||
"prospector.type": "log", | ||
"source.hostname": "::1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source of this HTTP request is the remote.
@@ -4,8 +4,8 @@ | |||
"grok": { | |||
"field": "message", | |||
"patterns":[ | |||
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?", | |||
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -" | |||
"%{IPORHOST:source.hostname} - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:apache2.access.agent}\")?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I just got into a rabbit hole based on this.
So yes, source can be a host (prior to determining if you have an IP literal or a hostname) there. I also saw that apache2.access.remote_ip
was previously stored as keyword
, which would have supported this. So unless we decide to change this field name (elastic/ecs#166), source.hostname
is indeed where you should store this.
Now if this module was doing GeoIP on this field, I guess I would go the extra step and conditionally duplicate this IP to source.ip
(after confirming this is an IP), and doing GeoIP on that field instead. WDYT?
@@ -4,8 +4,8 @@ | |||
"grok": { | |||
"field": "message", | |||
"patterns":[ | |||
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:apache2.access.method} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version}\" %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-)( \"%{DATA:apache2.access.referrer}\")?( \"%{DATA:apache2.access.agent}\")?", | |||
"%{IPORHOST:apache2.access.remote_ip} - %{DATA:apache2.access.user_name} \\[%{HTTPDATE:apache2.access.time}\\] \"-\" %{NUMBER:apache2.access.response_code} -" | |||
"%{IPORHOST:source.hostname} - %{DATA:user.name} \\[%{HTTPDATE:apache2.access.time}\\] \"%{WORD:http.request.method} %{DATA:url.original} HTTP/%{NUMBER:http.version}\" %{NUMBER:http.response.status_code} (?:%{NUMBER:http.response.body_sent.bytes}|-)( \"%{DATA:http.request.referrer}\")?( \"%{DATA:apache2.access.agent}\")?", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apache2.access.agent
: why not store directly in user_agent.original
?
30477c6
to
c8e5936
Compare
edc913a
to
df4bf12
Compare
b626286
to
ed04efb
Compare
* Update ecs-migration.yml file * Update changelog * Update generated files * Link old fields Todo: * Add aliases for old fields
6fa3efd
to
3e7658b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave this a careful look, and there's a few things to revisit (more details inline):
- Rebase needed
- Can't alias
apache2.access.remote_ip
(ecs-migration.yml) - Create aliases in the module's field definitions
- HTTP metrics comment (not a deal breaker)
- source.hostname not in ECS
- referer typo
- user_agent parsing currently broken
"rename": { | ||
"field": "apache2.access.agent", | ||
"target_field": "apache2.access.user_agent.original", | ||
"target_field": "user_agent", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your unparsed user_agent is no longer in the apache2.access.agent
, so currently the UA parser skips for all of your events. If you look at your JSON expected log results, you'll see they no longer include the parsed user agent :-)
However as I've discovered, you can't start by setting user_agent.original
, and then have the UA parser output to user_agent
. It won't merge its results around user_agent.original
, rather it will overwrite everything there. So you'll have your parsed UA but no longer the original.
The process has to be
- Extract original UA to temporary field
- Perform UA parsing with default target_field (default is already
user_agent
) - Move original UA from temporary field to
user_agent.original
# Filebeat modules | ||
|
||
## Suricata module | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll have to rebase from master, to get the Suricata source_ecs
replaced
## Apache | ||
|
||
- from: apache2.access.remote_ip | ||
to: source.ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alias cannot stay there, since the semantics are different.
In the grok you split this field towards two possible fields. In most cases the ip field will be the one that gets populated, but if HostnameLookups
is set to On
, part of your events will only populate source.domain
.
I think there's value in keeping the ambiguous field around (and therefore not making it an alias), and doing the split towards IP and domain anyway.
@@ -4,8 +4,26 @@ | |||
Apache2 Module | |||
short_config: true | |||
fields: | |||
- name: http.response.body_sent.bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to include the HTTP size metrics in ECS. But I don't think they'll look like that. The metrics available are typically request header size, request body size, response header size, response body size. And for both request and response, one could also be interested in having a total size. I'm not saying all of those need to make it to ECS necessarily. But given all of these options, I doubt the response.body_sent.bytes
will stick around as is.
I say we should hash that out first in ECS, and revisit all affected modules afterwards. Personally I'd leave as previous for now.
But not a big deal either way. Once we've figured this out in ECS, we'll revisit here anyway :-)
description: > | ||
The number of bytes of the server response body. | ||
|
||
- name: source.hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source.hostname
is no longer in ECS.
description: > | ||
test | ||
|
||
- name: http.request.referer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ECS doesn't reproduce the typo ;-) It should be http.request.referrer
- from: apache2.access.geoip.* | ||
to: source.geoip.* | ||
alias: false | ||
copy_to: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get around to actually creating the aliases in the module's field defs, you can check out how I did it for the other modules (example). The definitions are so regular that with judicious find & replace, you can get them adjusted to this module very quickly.
Closing in favor of #9245 |
Todo: